Segment-Based Acoustic Models for Continuous Speech Recognition
نویسندگان
چکیده
ity or acoustic observations conditioned on the state in Tied-mixture (or semi-continuous) distributions are an imhidden-Markov models (11MM), or for the case of the portant tool for acoustic modeling, used in many highSSM, conditioned on a region of the model. Some of the performance speech recognition systems today. This paper options that have been investigated include discrete dispiovides a survey of the work in this area, outlining the tributions based on vector quantization, as well as Gausdifferent options available for tied mixture modeling, introsian, Gaussian mixture and tied-Gaussian mixture disducing algorithms for reducing training time, and providtributions. In tied-mixture modeling, distributions are ing experimental results assessing the trade-offs for speakermodeled as a mixture of continuous densities, but unlike independent recognition on the Resource Management task. ordinary, non-tied mixtures, rather than estimating the Additionally, we describe an extension of tied mixtures to component Gaussian densities separately, each mixture segment-level distributions, is constrained to share the same component densities with only the weights differing. The probability density
منابع مشابه
Segment-Based Acoustic Models with Multi-level Search Algorithms for Continuous Speech Recognition
The goal of this project is to develop improved acoustic models for speaker-independent recognition of continuous speech, together with efficient search algorithms appropriate for use with these models. The current work on acoustic modelling is focussed on stochastic, segment-based models that capture the time correlation of a sequence of observations (feature vectors) that correspond to a phon...
متن کاملImprovements in the Stochastic Segment Model for Phoneme Recognition
The heart of a speech recognition system is the acoustic model of sub-word units (e.g., phonemes). In this work we discuss refinements of the stochastic segment model, an alternative to hidden Markov models for representation of the acoustic variability of phonemes. We concentrate on mechanisms for better modelling time correlation of features across an entire segment. Results are presented for...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کامل